Documentation Index
Fetch the complete documentation index at: https://docs.unpod.dev/llms.txt
Use this file to discover all available pages before exploring further.
Getting Started
Load testing helps ensure your Unpod applications can handle concurrent user traffic and maintain optimal performance under various load conditions.
The platform guarantees 99.99% uptime with automatic failover. SLA documentation
Metric Commitment
| Metric | Commitment | Credit Policy |
|---|
| Uptime SLA | 99.90% available | 0.5% credit per 0.1% below |
| End-to-End Latency p99 | less than 1500ms | Included in E2E |
| WebApp Service Latency | less than 10ms internal routing | Included in E2E |
| Vector Store Query p99 | less than 50ms | Included in E2E |
| MongoDB Write Fire and Forget | less than 40ms | Included in E2E |
| Data Purge Verification | On-demand audit | Included Enterprise |
The following metrics represent measured performance under optimal baseline conditions single session, warm cache, optimal network:
| Component | Measured p50 | Measured p95 |
|---|
| Platform Orchestration | 8ms | 12ms |
| Speech-to-Text STT | 0.5s | 0.7s |
| LLM Inference | 0.8s | 1.2s |
| Text-to-Speech TTS | 0.3s | 0.5s |
| End-to-End Voice Pipeline | 1.6s | 2.4s |
Concurrent Load Test Results
Platform stability validated under concurrent session load:
| Test Scenario | Concurrency | Success Rate | Avg Latency |
|---|
| Baseline Single Session | 1 | 100% | 1.6s |
| Low Concurrency | 5 | 100% | 1.65s |
| Medium Concurrency | 10 | 100% | 1.7s |
| High Concurrency | 15 | 100% | 1.7s |
Infrastructure Robustness
| Capability | Status |
|---|
| Auto-scaling | Horizontal pod scaling enabled |
| Failover | Multi-region redundancy |
| Connection Pooling | Optimized for concurrent sessions |
| Rate Limiting | Per-tenant throttling |
| Observability | Real-time latency monitoring |
| Data Residency | India region available |
Scalability Architecture
- Horizontal Scaling: Native HPA Horizontal Pod Autoscaler for all stateless components
- GPU Node Affinity: Dedicated GPU pools with NVIDIA A10G L4 for inference workloads
- Regional Infrastructure: Automatic routing through worldwide infrastructure for optimal latency
- Database Scaling: Postgres read replicas, MongoDB ReplicaSet with automatic failover
- SaaS Auto-scaling: Instant autoscale vs manual capacity planning for self-hosted
Latency Optimization Techniques
- Streaming STT TTS: Real-time processing without full-file buffering
- Speculative Decoding: Parallel token generation for faster LLM responses
- Same Availability Zone: Co-located services to minimize network latency
- gRPC WebSocket: Low-overhead protocols for inter-service communication
Notes
- End-to-End latency includes external service providers STT, LLM, TTS which contribute to variability under load.
- Platform orchestration layer maintains less than 15ms latency regardless of concurrent load.
- Performance optimizations for high-concurrency scenarios are actively being deployed.
- Custom SLA tiers available for enterprise customers with dedicated infrastructure.